Urine Metabolomics Pancreatitis Differential Results

Author

Robert M Flight

Published

2024-06-27 13:42

Purpose

Differential analysis of pancreatitis urine metabolites between Green I and Red patient groups. There is an Executive Summary at the end of this document.

Data

Median normalized metabolite intensities, or metabolite - metabolite ratio data.

Methods

Metabolite abundances in each sample were normalized by calculating the median abundance in each sample, and dividing the abundances by the sample median. Metabolite abundances include only a few zero values (N = 4 across all metabolites and samples) that were subsequently treated as missing. Sample - sample correlations were calculated using information-content-informed Kendall-Tau (ICI-Kt), and median sample correlations used to detect possible outlier samples. No outlier samples were detected, and all samples were used for differential calculations. Prior to calculation of metabolite - metabolite ratios, any missing values were replaced with the lowest observed value for that metabolite across samples. For each normalized metabolite or metabolite - metabolite ratio, a t-test was calculated using the \(log_2\) values in the Red and Green I patients. Any missing values were removed prior to the t-test calculation. P-values were adjusted using the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995).

For association with other patient covariates in the Red and Green I samples, ANOVA was used to perform the statistical test of \(log_2\) values with the covariate, regardless of whether there were only two more more categories. Missing values were removed prior to the ANOVA calculation. Some covariates necessitated special handling prior to ANOVA. Diabetes status (diabetes_bl) required changing all of the “N/A” instances to “Normal”. Etiology status (etiology) required removing all of the “missing” instances (encoded as -999) before the ANOVA.

Results

Direct Comparison

From the direct comparison of metabolites in the Green I and Red patients, 29 metabolites had an adjusted p-value <= 0.05. The volcano plot of log-fold-changes and p-values is shown in Figure 1. The red line corresponds to an adjusted p-value of 0.05.

The table of statistical results is shown in Table 1 and Table 2.

Table 1. Statistics of direct comparisons of named metabolites for patients in control and CP disease groups.

Metabolite LogFC CP Control p.value p.adjust n_CP n_Control
Phenoxyacetic acid −1.00 −2.58 −1.59 1.81 × 10−4 9.56 × 10−3 40 39
3-aminoisobutyric acid −0.80 0.50 1.30 3.33 × 10−3 4.47 × 10−2 40 39
Citrulline −0.38 −0.83 −0.45 1.53 × 10−3 2.95 × 10−2 40 39
Isoleucine −0.38 −0.63 −0.25 3.75 × 10−3 4.55 × 10−2 40 39
Fucose 0.51 3.02 2.51 3.54 × 10−3 4.47 × 10−2 40 39
Ribose 0.58 1.09 0.51 3.07 × 10−3 4.38 × 10−2 40 39
Xylulose 0.73 0.95 0.21 2.73 × 10−4 1.12 × 10−2 40 39
Lactose 0.78 2.09 1.31 8.41 × 10−4 2.13 × 10−2 40 39
Cellobiose 0.79 1.70 0.91 2.71 × 10−3 4.23 × 10−2 40 39
Ribonic acid 0.87 1.08 0.21 1.39 × 10−3 2.85 × 10−2 40 39
Sucrose 1.24 0.85 −0.39 2.97 × 10−3 4.38 × 10−2 40 39
6-deoxyglucitol 1.43 2.20 0.77 3.42 × 10−3 4.47 × 10−2 40 39
Mannose 3.67 0.57 −3.10 5.89 × 10−6 1.93 × 10−3 40 39

Table 2. Statistics of direct comparisons of un-named metabolites for patients in control and CP disease groups.

Metabolite LogFC CP Control p.value p.adjust n_CP n_Control
228072 −1.25 −3.61 −2.36 2.36 × 10−5 3.87 × 10−3 40 39
345452 −1.21 −2.79 −1.58 2.04 × 10−4 9.56 × 10−3 40 39
41866 −0.94 −2.45 −1.51 4.10 × 10−3 4.72 × 10−2 40 39
33983 −0.75 −2.27 −1.52 1.11 × 10−3 2.42 × 10−2 40 39
345144 −0.61 −2.61 −2.00 1.69 × 10−3 3.07 × 10−2 40 39
2091 −0.50 3.96 4.46 8.42 × 10−4 2.13 × 10−2 40 39
2082 −0.49 0.20 0.69 9.23 × 10−5 7.57 × 10−3 40 39
473743 −0.47 1.35 1.82 2.30 × 10−3 3.77 × 10−2 40 39
105379 0.58 −1.00 −1.58 4.18 × 10−3 4.72 × 10−2 40 39
133590 0.60 0.71 0.12 5.98 × 10−4 2.13 × 10−2 40 39
4901 0.70 1.17 0.47 1.07 × 10−3 2.42 × 10−2 40 39
473342 0.75 −1.35 −2.10 7.65 × 10−4 2.13 × 10−2 40 39
11353 0.76 0.80 0.04 1.20 × 10−4 7.85 × 10−3 40 39
34113 0.79 0.87 0.08 2.11 × 10−3 3.65 × 10−2 40 39
31764 1.35 0.64 −0.72 7.94 × 10−4 2.13 × 10−2 40 39
18242 1.68 −0.07 −1.75 6.65 × 10−5 7.27 × 10−3 40 39

Figure 1. Volcano plot of Red - Green I differences. Red line indicates an adjusted p-value of 0.05.

For each of the metabolites with an adjusted p-value <= 0.05, we can plot the distribution of values in the control and chronic pancreatitis (CP) patients to verify the fold-change and p-values.

Figure 2. Raincloud plots of the log2(metabolite) abundances for each group for each named metabolite with a negative log-fold-change. The raincloud plot is a combination of 3 plots: 1 - the original data points; 2 - a boxplot of the distribution; and 3 - a density estimate.

Figure 3. Raincloud plots of the log2(metabolite) abundances for each group for each named metabolite with a positive log-fold-change. The raincloud plot is a combination of 3 plots: 1 - the original data points; 2 - a boxplot of the distribution; and 3 - a density estimate.

Figure 4. Raincloud plots of the log2(metabolite) abundances for each group for each un-named metabolite. The raincloud plot is a combination of 3 plots: 1 - the original data points; 2 - a boxplot of the distribution; and 3 - a density estimate.

ROCs for Differential Metabolites

Figure 6. ROC for the named, significant metabolites with a negative log-fold-change.

Figure 5. ROC for the named, significant metabolites with a positive log-fold-change.

Figure 7. ROC for the unnamed, significant metabolites.

Multiway ANOVA

One concern would be that some of the differences in metabolites are due to not just the disease cohort, but also due to the diabetic status and / or gender of the patient. For just this analysis, we replaced the diabetes status of “N/A” with “No”. You can see the justification for this in the QC/QA document. A simple way to check this is instead of doing a t-test just on disease cohort, we can run a multiway ANOVA where we include disease, diabetes, gender and their interactions in the model.

The way that looks in R is:

\[aov(log\_intensity \sim cohort\_c + diabetes\_bl + gender + cohort\_c:diabetes\_bl + cohort\_c:gender)\] This model includes all of the factors that we think are potentially contributing, and allows us to get p-values directly for each term and their interactions. Caveat: ANOVA makes different assumptions around normality than the t-test, so the set of things metabolites that are significant by ANOVA will be slightly different than those returned from the t-test.

More importantly, however, is that the set of things with an adjusted p-value <= 0.05 from cohort_c in the ANOVA model are completely different than anything in the other terms, and in fact the only other term with adjusted p-value <= 0.05 is diabetes status, with myo-inosotol.

Metabolite - Metabolite Ratio Comparisons

Compared to the direct comparison, the ratio results had many, many more significant entries. To make them somewhat tractable, we used a more stringent cutoff of 0.01.

Table 3. Statistics of metabolite - metabolite ratios comparison for patients in Green I and Red disease groups. The top 20 ranked ratios by absolute log-fold-change are shown here after removing mannose from the results (N significant 294). Un-identified metabolite 18242 also has a large number of significant ratios (N significant 144).

metabolite1 metabolite2 LogFC x6 x1 p.value p.adjust
228072 18242 −2.93 −3.54 −0.61 2.82 × 10−7 1.41 × 10−3
345452 18242 −2.89 −2.72 0.17 8.22 × 10−6 2.01 × 10−3
33386 18242 −2.87 −0.15 2.72 4.82 × 10−6 1.71 × 10−3
354281 18242 −2.73 −0.45 2.27 1.84 × 10−5 2.87 × 10−3
phenoxyacetic acid 18242 −2.68 −2.52 0.16 1.56 × 10−6 1.41 × 10−3
41866 18242 −2.61 −2.38 0.24 3.11 × 10−5 3.68 × 10−3
228072 31764 −2.60 −4.24 −1.64 1.88 × 10−6 1.41 × 10−3
345452 31764 −2.57 −3.43 −0.86 6.26 × 10−5 5.03 × 10−3
41924 18242 −2.55 −0.41 2.14 3.85 × 10−5 4.06 × 10−3
33386 31764 −2.55 −0.85 1.69 5.51 × 10−5 4.77 × 10−3
31559 18242 −2.50 1.29 3.79 2.53 × 10−6 1.52 × 10−3
32284 18242 −2.49 −1.50 0.99 4.91 × 10−6 1.72 × 10−3
3-aminoisobutyric acid 18242 −2.48 0.57 3.05 6.98 × 10−6 1.85 × 10−3
sucrose 228072 2.49 4.46 1.97 2.77 × 10−5 3.43 × 10−3
31764 4983 2.55 −0.51 −3.06 1.97 × 10−4 9.22 × 10−3
6-deoxyglucitol 33386 2.62 2.42 −0.20 1.12 × 10−4 6.90 × 10−3
6-deoxyglucitol 4983 2.63 1.06 −1.57 2.27 × 10−4 9.85 × 10−3
6-deoxyglucitol 228072 2.68 5.81 3.13 2.48 × 10−5 3.26 × 10−3
18242 2424 2.79 −1.27 −4.06 5.59 × 10−5 4.81 × 10−3
18242 4983 2.88 −1.22 −4.09 2.28 × 10−5 3.13 × 10−3

Alternatively, we can sum the absolute log-fold-changes for each metabolite in each significant ratio to get a ranked list of the metabolites involved in the significant ratios. Those that appeared in more than 10 significant ratios are shown in Table 4.

metabolite LogFC_sum N_ratio
mannose 1,110.4 294
18242 290.2 144
228072 192.4 113
345452 107.0 60
phenoxyacetic acid 100.3 62
31764 97.1 52
11353 83.2 67
xylulose 77.9 61
473342 74.2 57
lactose 58.5 44
33983 58.1 40
ribonic acid 53.9 37
133590 47.3 41
4901 38.9 30
2082 36.9 31
glycerol-3-galactoside 36.9 21
cellobiose 36.7 27
345144 35.6 24
3-aminoisobutyric acid 35.6 20
34113 32.9 21
1684 32.7 19
sucrose 32.6 16
61 32.3 22
6-deoxyglucitol 31.1 14
509528 30.7 19
41866 27.6 14
ribose 27.4 22
31559 26.2 14
3-hydroxyphenylacetic acid 26.0 15
16747 25.5 20
33386 25.2 10
fucose 25.0 21
105379 25.0 19
196521 24.5 20
citrulline 23.5 19
1,2-anhydro-myo-inositol 23.2 14
2-hydroxyvaleric acid 21.4 14
2091 20.4 14
lactic acid 20.0 13
alpha-aminoadipic acid 19.7 13
32247 19.5 13
473743 18.9 13
33410 18.9 13
glycolic acid 18.4 12
uracil 18.0 11
isoleucine 18.0 12
serine 18.0 12
cystine 17.3 10
N-alpha-acetyllysine 17.1 12
4-hydroxybenzoate 16.8 10
224100 16.4 12
342750 16.1 11
342512 15.7 10
199094 15.4 10
lyxitol 14.6 11
1799 14.4 10

Figure 8. Volcano plot of Red - Green I metabolite-metabolite ratio differences. Red line indicates an adjusted p-value of 0.01.

Multiway ANOVA

In contrast to the direct comparisons, the multiway ANOVA using ratios did turn up some other things significant, however they were very few compared to the number of ratios significant using the t-test (1257). In particular, terms cohort and diabetes with 6 significant ratios; diabetes with 9 significant ratios.

Similarity of Ratio and Direct Comparisons

We can also ask if there were any different metabolites in the significant ratio entries compared to the significant direct comparisons. We actually queried all of the significant metabolites in the ratio results, and then checked if one of the pairs was from the direct comparison. There were 47 significant ratios where one member was not from the direct comparison results.

Feature - Feature Correlations in Green I & Red

For each feature, we calculated the ICI-Kt correlations amongst all features across the Green I and Red samples. We then trim to just the significant features, to see if anything behaves very similarly. The heatmap of correlations is shown in Figure 9.

Figure 9. Feature-feature ICI-Kt correlations in the Green I and Red patient samples.

It looks like there are clusters of features in here based on their feature - feature correlations.

We can also plot the heatmap of feature abundances across the Green I and Red patients, and use the above ordering to order the features, as shown in Figure 10.

Figure 10. Significant metabolite abundances by patient, clustered by ICI-Kt correlation.

Metabolites Differential for Other Covariates

In addition to testing for statistical differences with disease, we can also test for statistical differences with other patient covariates. These included:

  • Age
  • Smoking Status
  • Race
  • BMI Group
  • Gender
  • Etiology
  • Drinking Status
  • DXA Result
  • Diabetes Status

The UpSet plot in Figure 11 shows that the primary covariate that the differential metabolites intersect with are those from diabetes status, with a couple others shared with age and smoking status. Which metabolites are in each column or group of combinations of covariates are listed in Table 5.

Figure 11. UpSet plot of significant metabolites from pancreatitis stage (Green and Red I), and other patient covariates. Columns denote the intersection of covariates, and how many metabolites are shared between that intersection. Rows indicate which covariate, as well as the number of significant metabolites in that covariate. Only metabolites significant in two or more covariates are considered here.

Metabolite Pancreatitis Stage Diabetes Status Age Smoking Status Drinking Status BMI
Group 1
11353 0.0078 0.00042 0.024 0.035 0.24 0.27
Group 2
isoleucine 0.046 0.013 0.024 0.12 0.10 0.45
18242 0.0073 0.0015 0.038 0.074 0.10 0.46
4901 0.024 0.0039 0.024 0.26 0.46 0.19
Group 3
2091 0.021 0.025 0.29 0.036 0.19 0.19
Group 4
354281 0.056 0.025 0.57 0.040 0.16 0.00054
Group 5
cellobiose 0.042 0.0018 0.088 0.090 0.53 0.47
citrulline 0.030 0.0018 0.097 0.27 0.60 0.15
lactose 0.021 0.00042 0.088 0.097 0.38 0.19
mannose 0.0019 0.0022 0.20 0.15 0.45 0.92
ribonic acid 0.028 0.049 0.39 0.26 0.12 0.43
ribose 0.044 0.025 0.35 0.067 0.51 0.66
105379 0.047 0.010 0.33 0.62 0.34 0.76
133590 0.021 0.0018 0.13 0.22 0.60 0.19
2082 0.0076 0.0014 0.086 0.056 0.28 0.77
3-aminoisobutyric acid 0.045 0.017 0.088 0.18 0.51 0.054
31764 0.021 0.011 0.27 0.080 0.22 0.54
34113 0.036 0.011 0.086 0.062 0.34 0.21
345452 0.0096 0.0018 0.15 0.29 0.51 0.79
41866 0.047 0.0066 0.23 0.33 0.74 0.71
473342 0.021 0.028 0.33 0.34 0.50 0.86
473743 0.038 0.0094 0.33 0.090 0.19 0.39
6-deoxyglucitol 0.045 0.039 0.42 0.29 0.25 0.76
xylulose 0.011 0.020 0.19 0.095 0.11 0.19
Group 6
ascorbic acid 0.29 0.020 0.038 0.70 0.92 0.21
threonine 0.11 0.023 0.038 0.097 0.38 0.65
uracil 0.064 0.0014 0.025 0.10 0.47 0.19
2-hydroxyvaleric acid 0.15 0.025 0.038 0.43 0.51 0.88
Group 7
106904 0.12 0.032 0.49 0.0080 0.36 0.46
509528 0.059 0.041 0.097 0.010 0.54 0.14
892 0.095 0.0058 0.33 0.036 0.38 0.10
Group 8
32529 0.13 0.13 0.45 0.044 0.040 0.39

Table 5. The list of metabolites and their adjusted p-values from each covariate, corresponding to what is shown in Figure 11. Each column here is a “Group” of metabolites, and the bolded entries denote significant adjusted p-values in the corresponding covariate statistical test.

Metabolite Pancreatitis Stage Diabetes Status Age Smoking Status Drinking Status BMI
228072 0.0039 0.10 0.17 0.062 0.21 0.66
phenoxyacetic acid 0.0096 0.13 0.57 0.079 0.10 0.39
33983 0.024 0.064 0.13 0.27 0.38 0.34
345144 0.031 0.063 0.61 0.19 0.36 0.80
sucrose 0.044 0.16 0.15 0.23 0.10 0.19
fucose 0.045 0.086 0.34 0.56 0.74 0.82
myo-inositol 0.087 0.00049 0.088 0.92 0.94 0.83
121191 0.075 0.0014 0.13 0.74 0.94 0.34
1,2-anhydro-myo-inositol 0.066 0.0018 0.097 0.72 0.98 0.97
1-methylgalactose 0.063 0.0018 0.088 0.30 0.95 0.32
lactic acid 0.056 0.0055 0.088 0.24 0.34 0.14
serine 0.075 0.0058 0.052 0.27 0.39 0.21
glutamine 0.23 0.0059 0.42 0.50 0.76 0.17
N-carbamylglutamate 0.15 0.0062 0.33 0.82 0.98 0.90
109464 0.23 0.011 0.39 0.46 0.85 0.31
32284 0.075 0.015 0.19 0.33 0.85 0.43
146226 0.49 0.016 0.42 0.61 0.85 0.34
urea 0.10 0.017 0.097 0.21 0.48 0.77
glucose 0.21 0.017 0.33 0.69 0.92 0.88
indole-3-acetate 0.12 0.019 0.42 0.25 0.74 0.64
33386 0.063 0.019 0.21 0.63 0.86 0.79
citramalic acid 0.16 0.020 0.30 0.78 0.87 0.97
ethanolamine 0.25 0.020 0.088 0.29 0.38 0.20
isothreonic acid 0.071 0.020 0.39 0.090 0.28 0.53
109393 0.65 0.020 0.20 0.92 0.92 0.86
1684 0.059 0.023 0.33 0.12 0.60 0.17
61 0.056 0.025 0.22 0.21 0.50 0.88
4-hydroxyphenylacetic acid 0.13 0.025 0.33 0.14 0.80 0.46
creatinine 0.19 0.027 0.20 0.81 0.91 0.91
342512 0.059 0.031 0.16 0.13 0.60 0.67
84181 0.075 0.032 0.21 0.15 0.74 0.19
glycine 0.49 0.035 0.13 0.30 0.92 0.43
1799 0.20 0.036 0.45 0.56 0.85 0.21
231706 0.15 0.036 0.57 0.097 0.38 0.67
glucosamine 0.13 0.040 0.33 0.65 0.96 0.94
53954 0.075 0.040 0.60 0.11 0.49 0.49
32247 0.060 0.041 0.45 0.055 0.38 0.14
5900 0.071 0.041 0.16 0.067 0.28 0.34
137 0.063 0.047 0.088 0.56 0.53 0.77
12276 0.23 0.049 0.33 0.64 0.90 0.72
glycerol-3-galactoside 0.056 0.049 0.33 0.67 0.87 0.65
threitol 0.29 0.049 0.28 0.14 0.85 0.73
glycolic acid 0.094 0.049 0.15 0.095 0.11 0.48
UDP-glucuronic acid 0.11 0.050 0.16 0.26 0.16 0.21
phenylalanine 0.20 0.054 0.043 0.63 0.87 0.79
alpha-aminoadipic acid 0.089 0.070 0.088 0.0080 0.16 0.10
31664 0.087 0.14 0.33 0.036 0.46 0.19
34149 0.10 0.19 0.16 0.036 0.53 0.14
N-acetylneuraminic acid 0.096 0.29 0.51 0.044 0.38 0.21
31222 0.81 0.26 0.80 0.63 0.040 1.0
1968 0.27 0.21 0.92 0.37 0.30 0.022
4983 0.073 0.10 0.72 0.090 0.20 0.022
171295 0.14 0.13 0.51 0.11 0.10 0.036

Table 6. List of metabolites that were significant in only one covariate, by covariate and then by adjusted p-value in that covariate.

Executive Summary

  • Comparison of Green I vs Red patient groups.
  • 29 metabolites significant, sum named and some not.
  • Metabolite - metabolite ratios (n = 1257) did show new metabolites over the direct comparison.
  • Plots of the raw values demonstrate that the differences are real, but the distributions are extremely wide and overlapping.
  • Multiway ANOVA showed that the other factors, namely diabetes and gender, do not seem to be associated with the changes in abundances.
  • Statistical results for the direct and ratio comparisons are provided in conwell_pancreatitis_output_tables_YYYY-MM-DD.xlsx.

References

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological) 57 (1): 289–300. https://www.jstor.org/stable/2346101.